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ABSTRACT 

We investigate the error properties of certain galaxy luminosity function (GLF) esti- 
mators. Using a cluster expansion of the density field, we show how, for both volume 
and flux limited samples, the GLF estimates are covariant. The covariance matrix 
can be decomposed into three pieces: a diagonal term arising from Poisson noise; a 
sample variance term arising from large-scale structure in the survey volume; an occu- 
pancy covariance term arising due to galaxies of different luminosities inhabiting the 
same cluster. To evaluate the theory one needs: the mass function and bias of clus- 
ters, and the conditional luminosity function (CLF). We use a semi-analytic model 
(SAM) galaxy cat alogue from the Millennium run A-body simulation and the CLF of 
lYang et all (|2003f ) to explore these effects. The GLF estimates from the SAM and the 
CLF qualitatively reproduce results from the 2dFGRS. We also measure the luminos- 
ity dependence of clustering in the SAM and find reasonable agreement with 2dFGRS 
results for bright galaxies. However, for fainter galaxies, L < L», the SAM overpredicts 
the relative bias by ^10-20%. We use the SAM data to estimate the errors in the GLF 
estimates for a volume limited survey of volume V ~ 0.13 h~ 3 Gpc 3 . We find that 
different luminosity bins are highly correlated: for L < L» the correlation coefficient 
is r > 0.5. Our theory is in good agreement with these measurements. These strong 
correlations can be attributed to sample variance. For a flux-limited survey of similar 
volume, the estimates are only slightly less correlated. We explore the importance of 
these effects for GLF model parameter estimation. We show that neglecting to take 
into account the bin-to-bin covariances, induced by the large-scale structures in the 
survey, can lead to significant systematic errors in best-fit parameters. For Schechter 
function fits, the most strongly affected parameter is the characteristic luminosity L», 
which can be significantly underestimated. 

Key words: Cosmology: large-scale structure of Universe. Galaxies: abundances. 
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1 INTRODUCTION 

The galaxy luminosity function (hereafter GLF) is one of 
the central pillars of modern observational cosmology. Com- 
monly denoted <f>(L), it informs us about the comoving space 
density of galaxies, per unit luminosity interval L to L + dL. 
Its central importance originates through the following: it 
enables one to quantify the mean space density of galaxies 
in a patch of space; it provides a means for quantifying the 
evolution over time of the galaxy population in the Universe; 
it is one of the main tools for testing models of galaxy for- 
mation; finally it plays a central role in large-scale structure 
work, in the construction of mock galaxy catalogues and 
sample weighting for clustering estimates. 

There is a v ast and rich l iterature on this subject 
that goes back to iHubbld l|l936l ). and for a review of de- 
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velopments through to t he mid 90's see the re views by 
Binggeli et all (|l988h and lStrauss fc Willickl (|l995l ) and ref- 
erences there in. Over the past decade the invention of 
massive multi-object spectrographs has revolutionised this 
area of research and has led to an explosion in the num- 
ber of available redshifts with which to estim ate the GLF: 
at low redshifts there h as been the 2dFGRS dFolkes et al 



19991 : ICole et al.l l200ll: iNorberg et al 



2005n. the SPSS dBlanton et al 



2001 



20021 : ICroton et a" 



2003), and GAMA 



( Loveday et al.l [20121) surv eys; and at higher redshifts the 



ay 

VVDS illbert et al-flipoH )," DEEP2 dWillmer et al.l l2ui)j 



iFaber et al.ll2007h . and the zCOSMOS (|Zucca et al.ll2009h 

Our current astrophysical understanding of what shapes 
the GLF is evolving rapidly, as our understanding of how 



galaxies form also rapidly improve s (Kauffmann & Chariot 
19981 : iKauffmann et al.lll999l: ICole et al 2000; Benson et al 



20031 : ICroton et all l200rj ; iBower et alj|20ld ). This in part 



owes to the large spatial volumes that can now be simu- 
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lated with sufficiently high enough spatial resolution to fol- 
low the g rowth of dark matte r haloes which may host faint 
galaxies (|Springel et al.| [2005). One important insight that 
has emerged is that there is a quantity more fundamen- 
tal than the GLF, and t hat is the conditi o nal luminosit y 
function (hereafter CLF) |Yang et al.ll2003l ; ICooravll2006h . 
This informs us that the probability of obtaining a galaxy 
of luminosity L, is conditioned on the mass M of the host 
halo. This idea is supported by the results that the GLF is 
different between dense and void regions (s ee for example 
iBeiiersbergen et al.|[20o3 ; ICroton et alj|2005h . This galaxy- 
halo connection then provides us with a means for connect- 
ing the estimates of the GLF with the underlying large-scale 
structures (LSS). 

Whilst the astrophysics that shapes the GLF has been 
widely studied, our understanding of the statistical signifi- 
cance of GLF estimates is far from understood. As we enter 
an era where the parameterisations of 'good galaxy forma- 
tion models' are to be compared one needs a more concrete 
way of assessing the goodness of fit. Moreover, we would 
also like to be able to compare results from different sur- 
veys, to make conclusions about the evolution of the galaxy 
population. Again, this requires us to have a more con- 
crete method for interpreting features and differences. In 
this paper we aim to provide a theoretical framework within 
which one can calculate how large-scale structures impact 
not only the shape of the GLF, but also how it shapes the 
statistical propert i es of the errors. In passing, we note that 
iTrenti fc Stiavelhl l|2008l ) explored how cosmic variance im- 
pacts the GL F parameters for d eep high redshift surveys. We 
also note that [Robertson] (|20T0t ) explored a Fisher matrix ap- 
proach to forecasting the expected GLFs for future high red- 
shift surveys. He showed that sample covariance could cor- 
relate the galaxy counts in different magnitude bins. How- 
ever, as we will show, these authors failed to capture the full 
story. We believe that the formalism presented herein, goes 
someway beyond these earlier approaches. 

The paper breaks down as follows: in Sj2]we present an 
overview of some commonly used GLF estimators. In |3]we 
examine the expectation and covariance of the GLF estima- 
tor for volume limited samples. In i|4]we we do the same but 
for flux limited samples. In [J5]we describe empirical results 
for the GLF. We also describe the SAM galaxy catalogues 
that we use and also the CLF model that we employ. Here 
we also explore the luminosity dependence of galaxy clus- 
tering. In 15] we present our results for the error properties 
of the GLF in volume and flux-limited surveys. In SJ7] we 
explore the importance of including the full data covariance 
matrix for model fitting and parameter estimation. Finally, 
in |8]we summarise our findings and draw our conclusions. 



2 ESTIMATING LUMINOSITY FUNCTIONS 
2.1 ACDM paradigm 

Let us begin our theoretical development by following the 
standard paradigm for galaxy formation in a ACDM uni- 
verse: we assert that galaxies can only form inside dark mat- 
ter haloes, and that halo formation, and hence galaxy forma- 
tion, takes place hierarchically. Thus, massive galaxies are 
assembled through the accretion and merger of smaller ones. 



Thus, given a dark matter halo of mass M, the detailed the- 
ory of galaxy formation will tell us important information 
such as, the number, luminosity and types of galaxies that 
form inside such haloes. This of course will be a stochastic 
process and the exact number will vary between haloes. 

2.2 Overview of estimators 

One of the most basic observational tools for testing our 
understanding of galaxy formation models is through the 
GLF. Over the years there have been many approaches to 
constructing estimators for the GLF. The simplest is to com- 
pute: 



El: ML, 



V S AL U 



(1) 



where N e (L ll ) is the number of galaxies of luminosity in 
the bin AL M , and V a is the total sampled survey volume. 

For flux limited surveys this proves to be a biased es- 
timator, since for faint galaxies the volume out to which 
one may observe these objects is significantly smaller than 
for the case of bright galaxies. This can be corr ected for by 
adopting the l/ max estimator of lSchmidtl (|l96ST ): 



E2 : 



AL, 



E 

H=l 



\ 'max 



(2) 



where y™ ax = 1/ max (L M ) is the maximum volume that a 
galaxy with a luminosity could have been found in, given 
the flux limit of the survey mn m (for further details s ee Q. 
For a discussion of estimators El and E2 see iFeltenl (|l976l ) 
and references therein. 

It was noted that for shallow and narrow surveys esti- 
mators El and E2 would be 'biased' by the presence of large- 
scale over/underdense regions. Subsequently, a further set of 
esti mators were developed to try and remove this so called 
bias dTurnerlll979l;l Sandage et al ]| 19791 : iKirshner et al . 1979; 
lEfstathiou et a.1.1 ll98ST ). At the heart of these approaches 
is the assumption that the joint probability of obtaining 
a galaxy with luminosity L M in interval AL M , and spatial 
position in the volume element d^, is the product of two 
independent probability density functions (PDF): 



p(L Al ,x)dL M d 3 x = p(L M )p(x)dL M d 3 x 



where the 1-point luminosity PDF is 
d>(L) 



P(L) 



*(L) 



dL' </>{!/) 



(3) 



(4) 



where L m i n is the lowest luminosity galaxy detectable in the 
sample volume, given selection criteria. If x is the location 
of a random point then the probability of finding a galaxy 
in a cell of volume SV is given by: 



P(x) = p(x)d 3 x = NSV/V B = n8V 



(•>) 



However, if one pre-selects a cluster region centred on x c , 
then the probability is enhanced P(x|x c ) = n<SV[l + £ g c(f)]> 
where r = x — x c and £ gc (f) is the cross-correlation func- 
tion between the cluster centre and galaxies in the cluster 
(|Peebleslll980h . Then, for example for estimator El, the lu- 
minosity function estimate would be: 



El : 0(L M ) 



V a AL u 



(4,(LJ) [1 + a 2 ] 



(6) 
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where 



3.1 Expectation of estimator 



i=i 



(7) 



iTurnerl (|l979l ) saw that, under the assumption of 
Eq. (J3]), if one constructed the following quantity, then the 
environmental dependence of the counts would drop out: 



E3 



dN s {L y 



<j>(L^)AL^ x p(x)V s 
/" dL'0(L') x p(x)V E 

0(L M )dL M 



rdL^(Lo 



(8) 



where A rg [> L M , x ^ X max (^/j)] denotes the total number of 
galaxies brighter than L M with distance less than X max (^)- 
Unfortunately, the estima tor E3 is al so biased - the real 
world is more complicated (see lColell20Ul . for additional dis- 
cussion of this). The bias can be attributed to the fact that 
p(L,x) is not separable: bright / f aint galaxies tend to inhabit 
high/low density environments (|Norberg et al.ll2002h . To il- 
lustrate how this bias operates, let us consider the following 
toy example. Suppose our survey consists of two clusters 
at the same distance from the observer, and let cluster one 
contain galaxies of luminosity L\ and be of mass Mi, and 
let cluster two contain galaxies of L2 > L\ and be of mass 
M2 > Mi . Then since higher mass dark matter haloes have 
more extended profiles and also are more biased with respect 
to the underlying dark matter than lower mass haloes, then 
we have: £gc(f |£2) > £ gc (r\Li). On construction of Turner's 
estimator we find: 

dN(Li) 



N[> Li,x ^ X" 



V S AL {{<KLi)) [1 + al{Lx)] + (<f,(L 2 )} [1 + <7 2 2 (L 2 )]} 



1 



(0(Lr)) [l + cr?(Li)] 



(9) 



where in the above we have defined 

N{L) 

a i (L) = ML) E - X ^'I L ) ' ( 10 ) 

^ ' i — l 

Thus we see that, in this toy-model case, the estimator is 
biased low for the lower luminosity galaxies. 

In fact as we will show in the following sections the bias 
associated with estimators El and E2 approaches zero, pro- 
vided that the sample volume is sufficiently large. Whereas 
for estimator E3 one can see that owing to the fact that 
£ gc (r|Z/2) 7^ £ sc {r\Li), the estimator is biased. We shall re- 
serve a more detailed study of Turner's estimator and the 
bias induced by neglecting density-luminosity correlations 
for future study. 



3 VOLUME LIMITED GALAXY SAMPLES 

Let us consider the simplest estimator El, which one may 
apply to volume limited surveys. We are interested in com- 
puting the expectation and covariance. 



Consider some large cubical patch of the Universe, of volume 
Vs, and containing N c clusters that possess some distribu- 
tion of masses. Let us subdivide this set of clusters into a 
set of N m mass bins, and where the ath mass bin contains 
iV„ clusters. We shall denote the number of galaxies with 
luminosities between L M — AL^/2 and L M + Ai M /2, that 
are hosted by the ith halo of the ath mass bin, by Nf a . 

With the above definitions, the GLF estimator El for 
volume limited samples can be written: 



El 



1 % 

^ a — 1 i — l 



(11) 



We now wish to compute the expectation of this estimator. 
We shall write this as, 



<t>{L»)) = 



V B AL, 



E(E^„ 

a=l \i-l 



(12) 



g,P,s 



where in the above (. . . ) p s represents an averaging over 
the ensemble: the subscript g denotes an averaging over the 
sampling distribution for placing galaxies into haloes; the 
subscript P denotes an averaging over sampling clusters into 
the given realization of the density field; and the subscript 
s denotes an averaging over the density fluctuations within 
the volume. 

We shall assume that the number of galaxies occupying 
a given dark matter halo is a Poisson process: 



A »•">*' exp[— A] 



N s ! 



(13) 



where A = N & (M a , L^) is the expected number of galaxies 
in the L M luminosity bin, and for a halo of mass M a . Actu- 
ally, the above sampling distribution is not of great concern, 
but what will be of importance will be the independence 
of the distributions, i.e. the number of galaxies occupying a 
given cluster depends only on the physical properties of that 
cluster. 

One immediate consequence of this is that we may com- 
pute the average over the galaxy population separately, and 
hence write El as: 



V B AL 



a= 1 \ i=l 



V S AL 



1 n m 



(14) 



where in the last line we identified N g (M a , L M ) = (Nf a , 
which tells us the expected number of galaxies with luminos- 
ity in the interval [L M — AL M /2, L M + AL M /2] that occupies 
a cluster of mass M. 

In order to proceed further, we need to compute the 
expected number of clusters in the ath mass bin, (-/V£) p s . 
This may be done fol lowing the procedure described in 
ISmith fc Marian! (|201 lh (summarised in Appendix [X] for 
convenience). Following this procedure gives: 



V s n a 



(15) 
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where the number density of clusters in the ath mass bin is 
"M +AM /2 



it., -= I dMn(M) 

' M a -AM a /2 



(16) 



and where n(M)dM is the abundance of dark matter haloes 
in the mass interval [M - dM/2, M + dM/2]. On inserting 
this expression into Eq. (|14[) we find: 



N M 



= ^E^t^.y^ ■ ( l7 ) 



On taking the limit of small mass bins and assuming that 
the mass function varies slowly across the bins, then from 
the mean value theorem, we have 



n a » n(M«)AM 



(18) 



and we may conv ert Eg. (1171) to an i ntegral. Finally, on using 
the CLF model of lYang et all l|2003l ). for which $(L M |M a ) = 
N 9 (M a , L M )/AL A i, then we have: 



^OM) = J dMn(M)^(L„\M) . (19) 
Thus for volume limited samples, estimator El is unbiased. 

3.2 Estimator covariance 

Let us compute the covariance matrix that we would expect 
for estimator El. The covariance matrix is defined to be, 



C M „ = (4>u,<f>u) - (<t>, 



(20) 



where from now on we make use of the compact notation 
</> M = Focusing on the first term on the right-hand- 

side, and on inserting Eq. (|11[) , we find 



I N M n m 

A T. V 2 E E 

a=l /3=1 

' N% N^ 
i i=0 j=0 



(21) 



where again we have used the fact that the average over the 
galaxies can be separated from the cluster sample. Consid- 
ering the contents of the inner bracket, we see that this may 
be rewritten as 

{NU,n^) B = (K a j g {Kp,u)„ 

+ 5f j ~e af} ~e llv {Nl a J g {Nl^) g 
+ • ■ • + (5 terms) 



+ <^0£<(iv? ) 2 > 



(22) 



where in the above we have made use of a modified Levi- 
Cevita symbol etj = 1 if i =fc j and otherwise, and we 
have used the independence of the sampling distributions to 
separate the expectations of the products. Consider the final 
term in the above expression, on using Eq. (I13[) . we see that 
this piece can be rewritten as, 

> 2 

N 9 (M a ,L li )[l + N 9 (A4 a ,L„)] . (23) 



(<N 3 Y) = (N 9 ) +(N 9 ) 



On inserting this back into Eq. (|22l) . we may resum all terms 
and find that the expression simplifies to be, 

(^ a ,^ A „) 9 = N 9 (M a ,L»)N 9 (M ,L„) 

+ N 9 {M a ,L^jl P 5^ v . (24) 

If we now return to Eq. (|2ip . then on using the above rela- 
tion, we find: 

EE 



ALuALvVs 



a=l p=l 



x [(N c a N c p) ps N 9 (M a ,L»)N 9 (Mp,L v ) 

+ {N c a ) Pta N 9 (M a ,L^)S^S^] . (25) 

In order to proceed further we require an expression for the 
product (A^JVg) . Again, this may be obtained by fol- 
lowing the arguments presented in ISmith fc Marian! (|201ll ) 
(summarised in Appendix [XJ. Thus we have, 

(KN^) P)S = S afi + V s 2 n a n + V s n a S k a . . (26) 

The first term takes into account the excess variance above 
random in the number counts, which arises due to the spatial 
correlations of the clusters: 



S a f3 = Vs 2 n a Up b a bp <Jy 



(27) 



where in the above we have defined the effective bias of the 
clusters in the ath mass bin to be, 



n a J M 



rM a + AM a /2 

b a = ^- / dMb(M)n(M) (28) 

' M a -AM a /2 



and also introduced the volume variance 



d^k 

(2tt) 3 



\W(k)\ 2 P(k) 



(29) 



where VK(k) is the survey window function and P(k) is the 
matter power spectrum. 

Substituting Eq. (J26j| into Eq. (f25)l . gives 



, f N M N M 



ALaALuVs 



I a = l /3 = 1 



x N 9 (Mp, L„) [ S a p + V s 2 n a np + KnJ*,„] 

N M 

+ J2 v ^ a N 9 (M a ,L^)S^5^} . (30) 



Using Eqs (|27[) - (|29[) in the above expression, gives 

= M^X N9{M ^ )N9{M ^ ] 
x n a rTf3 {babpav + l] 

+ aXal^E^^(^-^)^(m„,l, 



N m 



AL^ALvVs 

Again, if the mass bins are sufficiently narrow, then we may 
use the mean value theorem to make the following approxi- 
mations: n a ~ n(M a )AM a and b a ~ b(M a ). This allows us 
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to transform the above expression into integrals over cluster 
mass. Next, if we subtract off the second term on the right 
hand side of Eq. (|20|l , this gives us the covariance matrix of 
the GLF. Note, that this simply removes the +1 from the 
first term in square brackets in Eq. (|31l) . Thus we find, 



Cut/ — 



AL M AL„ 



dM 1 / dM 2 n{M 1 )n{M 2 



x b{M 1 )b{M 2 )o 2 (y s )N a {M 1 ,L l ,)N a {M2,L l/ ) 
+ _L j dM 1 n(Mi)7V 9 (Afi,L M )iV 9 (Mi,L I ,) 

+ LJ dM 1 n(M 1 )N 9 (M 1 ,L„)8^ . (32) 



The above expression may be written in a more compact 
way by introducing the following expressions: the effective 
bias of galaxies in luminosity bin L^, 



dMm{Mi)b{Mi)N & {Mi,L IM ) , (33) 



and the effective number density of galaxies in the luminos- 
ity bin L^, 

ul = n 9 (L M ) = J dM 1 n(M 1 )N B (Mi,L ll ) . (34) 

On using these definitions in Eq. (|32|l . we find: 

MX. \H K 

\ T \ 

(35) 



,. , dMMM^^^^^ 
Vs ./ AL U ALv 



Finally, we may reexpress our result in terms of the CLF of 
galaxies <3?(Z/ M |M), as 



where we defined the 'halo occupancy covariance' to be 

E M „ = y J dAfm(Afi)*(L^|Afi)*(i„|Afi) . (37) 

Closer inspection of Eq. (|36[) reveals several interesting 
points. The first term informs us that the presence/absence 
of large-scale structures in the survey volume will en- 
hance/suppress the number of galaxies in our estimates and 
that this will lead to bin-to-bin correlations in the estimates 
of the GLF. The second term is the standard Poisson er- 
ror term, which dominates in the limits of rare counts. The 
third term is interesting, and tells us that, if our understand- 
ing of galaxy formation is correct and galaxies only appear 
inside haloes, then, even in the absence of structure, GLF 
estimates are correlated. This owes to the fact that, if we 
have a halo, then it most likely comes with a set of </>(L\M) 
galaxies and so the presence of one galaxy is correlated with 
the presence of additional galaxies. Finally, we note that 
iRobertsonl (|20ld ) wrote down terms similar to the first two 
in our Eq. (|36p . However, owing to his over-simplistic model 
for the number of galaxies hosted by a halo of a given mass, 
he failed to obtain the halo occupancy covariance term. 



3.3 Luminosity function correlation matrix 

A short corollary to this section is that we may now con- 
struct the correlation matrix from the covariance matrix: 



(38) 



This obeys the inequality \r^ v \ ^ 1. 

Inserting our expression for the covariance matrix given 
by Eq. (|36[1 into the above definition, we find 



(M„&»6g<T 2 (V a ) + 



2,i-, , , y 



+ 



<!>, 



,1/2 



.(39) 



Let us now factor out the Poisson error terms from the nu- 
merator and denominator of Eq. (|39[) . Note that the term in 
the numerator may be rewritten as 



^ (40) 



Whereupon, 



^/NlMblbla 2 {V s ) + + 6* v 

1/2 



n,= { „,„ } [iW)v»(v.) + £« + i] 



(41) 



and in the above we have defined the total number 
of surveyed galaxies in the luminosity bin L M to be, 



N® = cp^AL^Vs, and where we have defined: 



y/V B AL„V B AL v 



(42) 



\/<t>ll<t>v 

On manipulating the above expression, we find that it may 
also be written as, 

~ _ J dMn(M)N(L fl \M)N(L„\M) 



U l = { ^ } {JdMn(M)N(L,\M)} 



1/2 



(43) 



Several cases of interest may be noted. If, for the mo- 
ment, we neglect the halo occupancy covariance, i.e E M „ — > 
0, then we note the two cases: 



^/Wmb^a 2 (V s ) « 
VWNSbffia 2 ^) » 



(44) 
(45) 



In the first, the errors are dominated by the Poisson sam- 
pling of the galaxies and the covariance matrix is uncor- 
rected. In the second case, the matrix is dominated by the 
sample covariance, and the matrix can become perfectly cor- 
related: 



y/W^b%bla 2 {V s ) 

rw,,, } [n? (&f)% 2 (^)] i/2 



i 



(46) 



We may also make the important point that, taking V B — > oo 
and hence cr(V B ) — > 0, does not guarantee that the correla- 
tion between different luminosity bins is negligible. As the 
above equations clearly show, it is the quantity V B a 2 (V B ) 
that is required to vanish for negligible correlation to occur. 
Indeed, for a power-law power spectrum, we would have that 
V B a 2 (V B ) oc i? 3 ir (3+n) oc R- n , which can only be made to 
vanish for n > 0. For CDM we have a rolling spectral in- 
dex, and n > for k < 0.01 foMpc -1 , which implies that 
V e >0.5/i" 3 Gpc 3 for the covariance to diminish with in- 
creasing volume. 
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On the other hand, if we now neglect the sample vari- 
ance term, i.e. a 2 (V B ) — > 0, then we have the two cases: 



< 1 
> 1 



Thus through computing the quantity: 

JdMn(M)7V 2 (i M |M) 
Em " ~ J dMn(M)N(L ll \M) 



(47) 
(48) 



(49) 



one can determine the relative importance of the halo oc- 
cupancy covariance term with respect to the Poisson errors. 
Notice also that this is independent of the survey volume, 
and thus in principle sets the lower limit for the magnitude of 
the bin-to-bin correlations of the luminosity function data. 
In 96.1.11 we shall explicitly evaluate this expression for a 
particular CLF model. 



4 FLUX LIMITED SURVEYS 
4.1 Expectation of estimator 

We now turn to the more complicated case of estimating 
the GLF in flux limited surveys. Consider an observer at 
position x , if they survey all galaxies down to an apparent 
magnitude depth of mn m , then the GLF may be obtained 
through use of estimator E2 given in Eq. ([2]). In terms of the 
quantities used in Sec. [3] this estimator may be expressed 
as: 



0(L M |x o ) = 



l/ u max AL, 



EE^.m 9 ^-*^) ■ (50) 



where N% and Nf a are as defined in Eq. l|lip . There are 
two new components in the above equation. The first mod- 
ification is that we require a survey selection function 0, 
which has the form: 



e(x,jL M ) = 



1 [|X<| < Xmax(£ M )] 
[|Xi| > Xmax 



(51) 



where x max (L^) is the maximum distance that a source of 
luminosity L,,, or identically absolute magnitude M (see 
Eq. (|98|) for the conversion), can be seen, given the apparent 
magnitude limit of the survey mu m : 



(52) 



The second modification is that the survey volume now be- 
comes V s -s> V max (L M ) = V™ ax , which is the maximum vol- 
ume that a galaxy with a luminosity could have been 
found in, given the flux limit of the survey mii m . For a sur- 
vey of solid angle Q a , this can be written 



V-max 
u. 



dV( X ) 



dx 



(53) 



where dV(x) is the comoving volume element out to comov- 
ing geodesic distance x( a )- in what follows we shall assume 
a flat space-time geometry and so take the survey volume at 
luminosity to be, 



t rmax 3 / r \ 

— "^"Xmaxl-k/jJ 



(54) 



The expectation of the GLF estimator can be written 
1 Nm /5* 



CK — 1 \i — 1 



g,P,s 



N M 



I— E N 3 (M a , L.) / g e(xf|L M ) ) (55) 



y-max^L 



where in the above, for convenience, we have taken x as 
the origin of the coordinate system. The last factor in the 
above equation simply gives the number of clusters in mass 
bin a that host galaxies of luminosity L M , which would be 
detected in the survey volume. We shall define this as, 



Nai^L^) = EQ( X i|^) 



(56) 



On averaging the above expression over the sampling distri- 
butions, we find 

{K(L,)) PtS = n a V™ . (57) 
Substituting this back into Eq. (|55p we arrive at the result: 

\ 1 Nm 

(^))= 1/ma x AL J2N*(M a ,L»)n a Vr*- (58) 

In the limit of small mass bins, then we may approximate 
n a « AAl a n(M a ), and hence rewrite the above expression 
in integral form as, 



<t>{L, 



dMn(M)$(L M |M) 



(59) 



where again we have used <f>(L M [M) = Af 9 (Af a ,L M )/AL M . 
This agrees with the estimator for the volume limited survey, 
and hence when dealing with a flux-limited survey E2 is also 
formally an unbiased estimator. 

4.2 Estimator covariance 

The covariance matrix of GLF estimator E2 can be written, 

C = - (?#•) ($») ■ (60) 

Similar to our analysis for Eq. (|2ip . let us focus on the first 
term on the right hand side, and on inserting Eq. (|50(l . we 
find that this can be written, 



- y^y^ 

M v P v a=l/3 = l 



x EEiC/^^e^l^iefx^,) (6i) 



i=0 j=0 



The inner average over the galaxy population is given by 
Eq. (|24p . and after inserting this in to Eq. (|61[l we find, 

\^N 9 (M a ,L^ 



AL^ALv V„ max V„ max 



N a N f> 



N 9 {Me,L„) ( EE e ( x »l L ^) e ( x 5l L ^ 

l i=0 j=0 



AT»(M a ,L M )(E e ( x il^)) C . (62) 
\i=o / p, s J 
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where in obtaining the last term in the above expression we 
used the fact that © 2 (x^|L M ) = G(x?|L M ). Using Eg. ([56) 
we may rewrite the correlation of 'observable' clusters, that 
host galaxies in the luminosity bins L a and L v as, 



g^G(x?|L M )e(x5|L„)) ={K(L^NI(L v )) Ps . 

i=Oj = I ps 

(63) 

The above expression may be evaluated in exactly the same 
way as Eq. (|26[) . however in this case we must take into ac- 
count that the survey volume varies with the luminosity bin. 
This leads us to write, 

(NUL„)Nt(L v )} Ps = S a e(L^L v ) + Vr*n a Vr*n? 

+mm[Vr*,Vr*]n a 5le , (64) 
where in the above we have defined the quantity 

SaplL^jLv] = v£ nax V£ nax n„% b a bp a 2 (L^,L„) ; (65) 
with 

a 2 (L„L v ) = J J^L P (k)W(k\L,)W(k\L„) . (66) 

The quantity W(k\L ti ) represents the Fourier transform of 
the window function associated with the survey volumes for 
galaxies of luminosity L M . Explicitly, this is written: 



YS 



d 3 xexp[ik- x]B(x|L M ) . (67) 



On inserting Eqs (JHSJl — dHSl) into Eq. fl52J , we find 

m^ktX n9{m ^ )n9{m ^ u: 

x [ b a bpa 2 (L,,, L v ) + l] 

1 ^ min[V; max , K max ] 

.AT,.. ^ 



n a np 



ALuAL„ ^ 1/maxy-max 

^ ex. ^ 

xN 9 (M a ,L„)N 3 (M a ,L„)n a 

1 N M 

^ at 

On inserting the above expression into Eq. (160[l gives the co- 
variance matrix of GLF estimates. Note that, the subtrac- 
tion of the terms (</> M ) (0„) simply corresponds to removing 
the +1 from the above expression. In the limit of narrow 
mass bins we may approximate n a ~ n(M a )AM a , and the 
above sums may also be converted in to integrals. Finally, 
on using the relation $(L|M) = N 9 (M,L)AL, we find the 
covariance matrix for flux-limited GLF estimates to be: 

J,0„. 



Cp„ = </v 4>v bft b 9 v a (L M , L v ) + 



V,P ax AL, 



+ K» ■ ( 69 ) 



In the above, we have defined the halo occupancy covariance 
matrix for flux limited surveys to be: 



1 



max[V M max , V™ 



dMn(M)$(L^|M)$(L„|M) 



(70) 



Note, that if we take y™ 3 * — V s , then we exactly recover 
our earlier result of Eq. (|36l) for the volume limited sample. 



4.3 Luminosity function correlation matrix 

Following the discussion of £|3 . 31 and from Eq. (I41|l , we may 
write the correlation matrix for flux limited surveys as: 



y^Vpff b% bla 2 {L^ Lu) + Iff + 8* v 

n i={M ,„ } [n? q%)*o*{lm + + 1] 1/2 



(71) 



where in the above we have defined the total number 
of surveyed galaxies in the luminosity bin L M to be, 
AT» = ^AL.V™, and where 



(72) 



Note that for the diagonal elements of this matrix, it can be 
shown that EjL = E Mfl . As before, several cases of interest 
may be noted. Firstly, if we neglect the occupancy variance, 
Ejj^ — > 0, then we have: 



4NlMblblo 2 (L^L v 



s/N?Mb B u,bl(T 2 {L^L v ) 



> 



(73) 
(74) 



As in £13.31 the first condition leads to Poisson dominated 
counts and an uncorrelated matrix, and the second to a per- 
fectly correlated matrix. We thus deduce that for the case 
of the flux limited survey, the sample covariance will only 
vanish when X /V™ ax V^a 2 (L M , L v ) 



0. 



Alternatively, for the case of no sample variance, 
cr 2 (L M ,L„) — > 0, we have the same situation as for the vol- 
ume limited case, and Eq. (|49|l provides an indication of the 
relative strength of the halo occupation variance with re- 
spect to the Poisson noise, which is independent of the sur- 
vey volume. 

In the following sections we will attempt to quantify the 
level of covariance in volume limited and flux limited GLF 
estimates. 



5 EMPIRICAL RESULTS AND MODELLING 

In this section we briefly summarise the procedures that we 
use for modelling the GLF, the CLF and the luminosity 
dependence of galaxy clustering. 



5.1 An empirical luminosity function 

Over the past few decades, the Schechter function has been 
found to provide a reasonably good description of the GLF. 



<j>(L)AL = (f>, 





L ' 


dL 


(£) exp 


_~Z7_ 


L, 



(75) 



where L„ and 0* are the characteristic luminosity and num- 
ber density of the surveyed galaxies, and a describes the 
power-law slope of the faint galaxies. 

For the 2dFGRS survey, the best fit Schechter function 
parameters, for galaxies with if-corrected bj luminosities in 
the range (—22.5 ^ M}, . — 5 log 10 h ^ —14.0), were found to 
be (|Norberg et al.ll2002r i: L* = 9.64 x 10 9 /i~ 2 L Q , a = -1.21 
and (/>, = 1.61 x 10" 2 /i 3 Mpc" 3 . 
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Figure 1. The galaxy luminosity function in the 2dFGRS survey. In both panels, t he solid red po i nts wi th errors show the measurements 
from the 2dFGRS and the dashed red line denotes Schechter function fit from iNorberg et al.l (|2002T ) . Left panel: com parison of the 
2dFG RS results with the luminosity function estimates made in £|5,2I from the semi-analytic model galaxy catalo gue oflCr oton et al] 
(2006). Right panel: Comparison of the 2dFGRS results with the conditional luminosity function (CLF) model of lYang et al.l (|2003h . 
denoted by the solid blue line. Note that the magenta dot dashed line shows the effect of c onvolving the CLF model magnitudes with 
the lognormal magnitude error model for the 2dFGRS as described bv lNorberg et al ] j2002h . 



5.2 Luminosity function of semi-analytic galaxies 

As wa s shown by iKauffmann et all (1 19991 ) and ICole et all 
(2000), semi-analytic models (SAM) of galaxy formation 
are a promising way to attempt to understand the complex 
physics of galaxy formation. The main advantage of this ap- 
proach is that it allows one to rapidly explore the effects of 
physical scaling relations on the observational properties of 
galaxies. This property also makes it a useful tool for gen- 
erating mock galaxy catalogues. 

In this study we make use of t he publicly available 
SAM catalogues of lCroton et al.l (|2006l ). These model galax- 
ies were generated by carefully following the detailed merger 
histories of dark matter haloes within the Millennium Run 
N-body simulation. This was an A^-body simulation that fol- 
lowed the non-linear evolution of structure formation with 
N = 2048 3 dark matter particles in a cubical box of length 
L = 500/i _1 Mpc. The cosmological model for this simula- 
tion was: {Sl m = 0.25, S7a = 0.75, n s = 1.0, as = 0.9, h = 
0.73}, where these are the matter and vacuum energy den- 
sity parameters, the primordial power-spectral index, the 
power spectrum normalisation, and the dime nsionless Hub- 
ble p arameter, respectively (for full details see lSpringel et ail 
l2005h . Through a novel treatment of AGN feedback in the 
radio spectrum, the authors were able to show that the pre- 
dicted bright end of the GLF could be qualitatively recon- 
ciled with observations from the 2dFGRS. 

In these catalogues, galaxy magnitudes are available in 
both BVRIK (Vega) or ugriz (AB SDSS) filters. Owing 
to the limited resolution of the Millennium Run simulation, 
the SAM galaxies were only able to be correctly followed 
down to M bj - 5\og 10 h < -15.6, (Lb ~ 2 x 10 8 /i" 2 L q ). 
The 2dFGRS sample goes one order of magnitude fainter. 
Having said that, the catalogues include a total of about 



9 million galaxies in the full simulation box, roughly ~ 40 
times more mock galaxies than can be found in the 2dFGRS. 

Figure Q] left panel, compares the 2dFGRS bj GLF 
with the GLF estimates obtained using Eq. $1$ for the SAM 
galaxi es. From this it can be seen that the ICroton et al.l 
(2006) galaxies do indeed provide a qualitatively good de- 
scription of the 2dFGRS GLF. The largest deviations are 
noticeable for galaxies with L > L*. Also, we see the 
drop-off in the number density of objects present around 
L ~ 2 x 10 s h~ 2 Lq, due to the limited mass resolution of 
the simulation. 



5.3 Conditional luminosity function 

The CLF was first introduced bv lYang etaH i|2003h . We now 
summarise their model, which has been highly successful at 
reproducing a large number of observational results from the 
2dFGRS. Again, given a halo of mass M, the CLF returns 
the number of galaxies per unit luminosity interval dL. It 
can be represented by a Schechter type function: 



$(L|M)dL = $» 





L ' 


(£) exp 






17. 



dL 

17 



(76) 



where the three free parameters $„ = $»(A/), L t = L*(M) 
and a = a(M) are all mass dependent quantities. These 
parameters in turn are described by the following functions: 



ais +r?log lo (M/[lO 15 / l - 1 M ]) ; 



2M 



(L) (M) 
L.r[a + 1] 



M 
Mi 



+ 



M 
Ah 



(77) 

-l 

(79) 
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With the additional auxiliary functions: 



(L) (M) = 2M 



/(«) = 



M 
Mi 



+ 



M 



rjg + 2] 
r[a + l,l] 



(80) 
(81) 



where T[x] and r[a;,a] are the Gamma and incomplete 
Gamma functions, respectively: 



rfxi = 



dzz x 1 exp(— z) ; 



r[a;, a] = / dzz x 1 cxp(— z) 



(82) 
(83) 



There are 8 free parameters in the above model, these 
are augmented by one final parameter M m i n , which specifies 
the minimum mass halo that may host a galaxy. For these 9 
param eters we use the best-fit values reported in I Yang et al.l 
i|2003l ): p = {ai 5 = -1.32, 77 = -0.36, log 10 Mi = 
10.42, log 10 M 2 = 11.74, (M/L) = 102, /3 = 0.6, 71 = 
0.28, 72 = 0.69, t] = -0.36, log 10 M min = 9.0}. 

Finally, the GLF can be obtained from the CLF, by 
integrating over the halo mass function as described in 
Eq. (|19p . Note, in eval uating Eg. (|19|), w e follow exactly 
the recipe presented in lYang et al T (|2003l ) and adopt the 
ISheth fc Tormenld 19991) mass function and the transfer func- 
tion of iBardeen et all |l986fl 

This is necessary, since if 
we were to adopt other models, then we would expect the 
quoted parameters to no longer be the maximum likelihood 
parameter set. Since this is a first calculation we are not too 
worried by this, however for a more precise calculation one 
should reoptimise p for the true cosmological model. 

Figure Q] right panel, compares the 2dFGRS bj GLF 
with the GLF obtained from Eg. (1191). As can b e seen from 
the figure, the CLF model of I Yang et al] (120031 ) (solid blue 
line) qualitatively provides a good description of the 2dF- 
GRS data. Note, the optimised best-fit parameters described 
in the paper of Yang et al., do not take into account the 
presence of magnitude errors in the 2dFGRS data. If we 
convolve the model magnitudes with the log-normal distri- 
bution described in iNorberg et al.l (|2002f ) . then we find a 
small increase in the abundance of the brightest galaxies. 
Appendix [B] describes the inclusion of magnitude errors. 



5.4 Luminosity dependence of galaxy clustering 

In order to make predictions for the covariance matrix of the 
GLF we must also understand the luminosity dependence 
of the bias of the gal axy distribution. W e explore this us- 
ing the SAM galaxies (ICroton et al.ll200"6T) . First, the galaxy 
catalogue is sliced into 8 bins in absolute magnitude. The 
exact magnitude bins that we employ and the numbers of 
galaxies in each bin are presented in Table [1] The corre- 
lation functions for the SAM galaxies were then estimated 



Note, in evaluating th e CLF mod e l, we adopt the cosmolog- 
ical parameters used by lYang et al. (2003): {n m = 0.3, Q A = 
0.7, n B = 1.0, as = 0.9}. These are slightly different from those 
used in the Millennium simulation, however they are the same as 
those used in the estimation of the 2dFGRS GLF. 



Table 1. Table showing: Col. 1: bin number; Col 2: the absolute 
magnitude limits of the bin; Col. 3: Number of galaxies within 
the bin from which we calculate the correlation functions. 

Bin Magnitude range Number of 
Number [Afj,, — 51og 10 /i] Galaxies 



[< -20.8] 
-20.0,-20.8] 
-19.3,-20.0] 
-18.6,-19.3] 
-17.8,-18.6] 
-17.1,-17.8] 
-16.4,-17.1] 
-15.6,-16.4] 



15,448 
130,447 
471,467 
876,150 
125,4400 
1,690,406 
2,367,636 
3,119,262 



using our parallel tree-code correlation function algorithm 
Du alTreeTwoPo i nt, w hich is based on the kD-Tree approach 
of|M oore et ail (|200ll ). The correlation functions were esti- 
mated in 40 logarithmically spaced bins in the radial interval 
r e [0.05, 50.0] fc -1 Mpc. 

Figure [5] shows the results for the galaxy correlation 
functions measured in the 8 luminosity bins presented in 
Table [T] On scales r > 3ft _1 Mpc the signal appears to 
demonstrate a power-law like form and with the brightest 
sample of galaxies being significantly more correlated than 
the lower luminosity galaxies. On smaller scales, however the 
signal is more complex: there appears to be a strong scale- 
dependence with lower luminosity galaxies becoming more 
strongly correlated than intermediate luminosity galaxies. 

Figure quantifies this scale-dependence in more detail, 
where we plot the relative bias of the SAM galaxies as a 
function of scale. We define the relative galaxy bias as: 



&rel(L/j, L v ) 



(84) 



The figure shows &? eI (-£< M , L m in). On scales r > 3/i _1 Mpc 
the bias is reasonably flat for all of the bins, but that, inter- 
estingly, the lower luminosity galaxies can be more strongly 
correlated than the intermediate luminosity bins. Further- 
more, it shows that on scales less than r ~ 1 h~ Mpc the 
brightest galaxy bins, with the exception of Bin 1, all possess 
a strong relative anti-bias, although Bin 1 does demonstrate 
a sharp dip at about the same scale. On still smaller scales, 
r < 100ft~ 1 kpc, the relative bias becomes strongly positive. 
Owing to the fact that we are primarily interested in under- 
standing the luminosity dependence of the large-scale bias, 
we shall reserve the understanding of this scale-dependence 
for future work. 

We now focus on the large-scale relative bias. In most 
observational studies the relative bias is computed with re- 
spect to the characteristic luminosity L* of the survey. For 
the SAM galaxies, this approximately corresponds to the 
galaxies in Bin 3. For this work, our operational definition 
of 'large scales' is given by (5h -1 Mpc < r < 30h -1 Mpc). 

Figure [4] shows &? el (£/i, L v — Bin 3), and the luminosity 
dependence of the large-scale relative bias measured from 
the SAM galaxies is represented by the blue solid line. This 
m ay be compared with t he results for the 2dFGRS obtained 
bv lNorberg et all (|2002h : 



(85) 
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Figur e 2. Correlation functions estimated from the lCroton et all 
(2006) semi-analytic galaxy catalogue in the Millennium simula- 
tion as a function of radial separation. The different coloured 
solid symbols show the results for the 8 absolute magnitude bins 
described in Table [T] 



LO 



i.O 



M-51og 10 h<-20.8 

-20.8<M-51og 10 h<-20.0 

-20.0<M-51og 10 h<-19.3 

- 19.3<M-51og 10 h<- 18.6 
-18.6<M-51og, n h<-17.8 



17.1<M-51og, h<-16.3 




0.1 1 10 

rad [Mpc/h] 

Figure 3. Relative scale dependence of galaxy bias measured 
for the different galaxy population s in the Millennium s imulation 
semi-analytic galaxy catalogues of ICroton et alj j2006tl . The rel- 
ative bias is defined with respect to the lowest luminosity galaxy 
bin. The connected coloured points show the results for the 8 
magnitude bins presented in Table [T] 



and represented in the figure by the red-dashed line. 

Interestingly, we see that the relative bias for the SAM 
galaxies is much flatter for faint objects than one finds for 
the 2dFGRS. The relative bias appears to have a minimum 
for L* galaxies and then increases slightly for fainter objects, 
whereas the bias steadily decreases for the 2dFGRS. How- 
ever, the SAM galaxies do correctly capture the trend that 
the brightest galaxies in the 2dFGRS are more strongly cor- 
related than the fainter ones. Thus, whilst the SAM galaxies 
are able to reproduce the GLF, they appear to only qual- 
itatively capture the luminosity depen dence of the c l uster - 
ing in the 2dFGRS. This failure of the lCroton et all (l2006t ) 
model to correctly capture the luminosity dep endence of the 
clustering has been noted in prev ious studies (|Li et al.ll2007l ; 
iKim et al.ll2009l ; lGuo et al.ll201ll 'l. These have attributed the 
discrepancy between the observations and the model to the 
fact that, too many faint satellite galaxies are placed in the 
high mass haloes. 

We may also obtain a prediction for the lu minosity de- 

[ ende nce of the bias using the CLF approach of I Yang et al.l 
20031 ). On rewriting Eq. (J53J in terms of the CLF we find, 



dMin(Mi)6(Mi)$(L M |M) 



(86) 



We have evaluated the above integral using the model de- 
scribed in £15.31 and the results are represented by the dot- 
dashed line in Fig. [4] Clearly, this model appears to accu- 
rately reproduce the luminosity dependence of the cluster- 
ing. However, this fact is not too remarkable, since the model 
was optimised using this data. The salient point is that we 
are able to reproduce the 2dFGRS results through evaluat- 
ing Eq. 



6 COVARIANCE OF THE GALAXY 
LUMINOSITY FUNCTION 

In this section we test our theoretical model for the covari- 
ance matrix of the GLF estimates. We start with the volume 
limited sample, and then move on to the more complex sce- 
nario of the flux limited sample. 



6.1 Results: Volume limited samples 

We use the SAM galaxy catalogues to construct an esti- 
mate of the covariance matrix of the GLF in volume lim- 
ited samples. We do this by following the approach for com- 
puting the cluste r coun t covariance, which was described in 
ISmith fc Marian! |201ll ). Briefly, we take the full volume of 
the Millennium simulation mock and slice it up into n A cu- 
bical cells. On taking ti = 4we have 64 quasi-independent 
sub-volumes of size L = 125 /i -1 Mpc. For each of these sub- 
volumes we estimate the GLF in 27 equal logarithmically 
spaced luminosity bins using Eq. fl]). From these 64 we con- 
struct the covariance matrix using the simple unbiased esti- 
mator: 



1 



(87) 
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Figure 4. Relative large-scale bias of galaxies as a function of 
luminosity. The dash ed red line represents the results from the 
2dFGRS presented in lNorbere et al.l j2002l) ; the connected open 
blue points denote our estimates from the semi-analytic g alaxy 
catalogues in the Millennium simulation llCroton et al.l2006r i; and 
the magenta dot-dashed line deno tes the results fro m the condi- 
tional luminosity function model o flYang et al.l J2003T) , which were 
constrained to match the 2dFGRS results. 
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Figure 5. Fractional errors in the galaxy luminosity function for 
a volume limited survey as a function of galaxy luminosity. The 
circular open points represent estimates obtained from the SAM 
galaxies. The solid blue line presents the total prediction of the 
theoretical model given by Eq. H36I I. The red dashed line denotes 
the contribution to the error from the sample variance; the green 
dot-dashed line corresponds to the error coming from the halo 
occupancy covariance; the magenta dotted line corresponds to 
the error from Poisson noise. 



where 



1 ~ 



(88) 



We are interested in exploring errors for a survey with 
volume V ~ 0.125 h~ 3 Gpc 3 , however, the above procedure 
provides us with the covariance matrix for survey volumes 
of the order V = 1.99 -3 h~ 3 Gpc 3 . We obviate this problem 
by approximating the covariance of the large volume to be 
the covariance on the mean, i.e. 



C^{V) « C^(V/n 3 ) = C^(V/n 3 )/n 3 



(89) 



Furthermore, in order to make predictions from the 
theory we must compute <r^>, i.e. Eq. (|29[) . This requires 
us to specify the surve y window function. As described in 
ISmith fc Marian] (|201ll ). one must actually be quite careful 
when computing this: if one wants to compare predictions 
with results from simulations then one needs to use the ex- 
act density modes that are in the box; if one wants to make 
predictions for the real Universe then the simulations fail to 
capture this correctly when the box-length L is comparable 
with the dimensions of the survey. In this case one should use 
theoretical predictions. Since here we are comparing with N- 
body simulations, a good approximation is to interpret the 
survey volume as being spherical in the following way: 

1/3 

It : — I (90) 



47T 

and take the window function to be 
3 

W(k\R) = — [sin j/ - ycosy] ; y 

y 6 



kR 



(91) 



Hence, the volume variance takes the simple form 
dkk 2 



2 



2tt 2 



\W(k\R)\" P(k) 



(92) 



6.1.1 Diagonal errors 

Figure [5] shows the diagonal elements of the covariance ma- 
trix divided by the ensemble average GLF estimates from 
the 64 sub-cubes in the Millennium simulation (open points) 
as a function of luminosity. In this figure, we also compare 
these results with the theoretical pred ictions from Eq. (I36[) . 
where we have used the CLF model o f Yang ct al. I (|200j" as 
the model input. From Eq. (|36p we find that, 



^ 2 a 2 v + 



1 



+ 



(93) 



The above expression informs us that: in the limit where 
the sample variance is dominant, which for this case occurs 
when AS > 10 4 , we have: 



(94) 



and in the limit where the Poisson noise is dominant, which 
for this case occurs when Nf: < 10 4 , we have: 



i 



(95) 
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Figure 6. Relative importance of the halo occupancy variance 
(c.f. Eq. I|49| with respect to the Poisson errors, as a function 
of the luminosity bin. Poisson errors dominate as -C 1. Note 
that this is independent of survey volume. 
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Figure 7. Relative contributions of the sample and halo occu- 
pancy covariance to the correlation matrix, for a volume limited 
sample of galaxies. The upper left triangle represents the sample 
variance plus Poisson noise contribution, with the halo occupancy 
covariance set to zero. The lower right triangle represents the halo 
occupancy covariance plus Passion noise contribution, with the 
sample variance set to zero. 



The luminosity dependence of the halo occupancy variance 
scales as 



(96) 



where E M(J is given by Eq. (j49J). We see from the denominator 
that this term scales in a similar way to the Poisson noise. 

Figure [5] shows that the theory and the measurements 
from the SAM are in excellent agreement. Further the above 
limiting cases are clearly demonstrated by the data. Note 
that for low-luminosities the errors in the SAM data appear 
to be slightly in excess of the theoretical predictions. From 
Eq. (|94[) we see that this can be attributed to the fact that 
the luminosity dependence of the bias in the SAMs is in 
excess of the bias one obtains from the CLF approach (c.f. 
discussion surrounding Fig. [4}. 

The above results are for a particular choice of V a and 
in principle, for a sufficiently large survey, V a a(V s ) — > 0, 
we are left with just the halo occupancy covariance and 
the Poisson noise. Figure [6] presents the ratio of the halo 
occupancy variance with respect to the Poisson noise, i.e. 

p.v. = S M(J . This demonstrates that 
for the brightest galaxies in a survey, the counts are dom- 
inated by Poisson errors. However, for the fainter galaxies 
L < L* the halo occupancy variance is roughly between 
~ 0.6-0.8 times the Passion noise, independent of the sur- 
vey volume. Note that this fractional relation between the 
halo occupancy variance and the Poisson errors holds ex- 
actly for both volume and flux-limited surveys. 



6.1.2 Correlation matrix 

Figure [7| presents the relative contributions to the correla- 
tion matrix. The top left triangle shows the sample covari- 



ance plus Poisson noise correlation matrix, with the halo 
occupancy covariance set to zero. The bottom right trian- 
gle shows the halo occupancy covariance plus Poisson noise, 
with the sample variance set to zero. This demonstrates 
that, for the case of a volume limited 2dFGRS-like survey, 
with volume of size V = 0.125 h~ 3 Gpc 3 , the off-diagonal 
elements of the covariance matrix are entirely dominated by 
the sample variance term. However, following our earlier dis- 
cussion from i]4.3l we now point out that if our survey was 
sufficiently large, such that V s o-(V s ) — > 0, then the matrix 
would still be correlated and that this would be given ex- 
actly by the bottom right panel of Fig. [7] The figure shows 
that the minimum correlation coefficient that could be ob- 
tained for galaxies with L < L* is roughly r ~ 0.2-0.4. 

The left panel of Fig. [8] presents the correlation matrix 
constructed from the estimates of the covariance matrix ob- 
tained through application of Eq. ()87p to the SAM data. The 
results show that the GLF estimates for galaxies with lumi- 
nosities L < L, are almost perfectly correlated, i.e. r ~ 1. 
This is a somewhat startling result, as it means that if there 
is an upward fluctuation of one bin with respect to the mean 
then all other bins share that same upwards fluctuation. As 
we will discuss later, this has broad implications for how one 
fits models to the measured GLF data. 

The right panel of Fig. [5] presents our theoretical pre- 
dictions for the correla tion matrix, evalu ated using Eq. (|41l) 
and the CLF model of I Yang et ail (|200ot ). We find that the 
theoretical predictions are in remarkably good agreement 
with the estimates from the SAM. The theoretical predic- 
tions are slightly more correlated than the measurements 
from the SAM. This might be attributed to the mis-match 
in the luminosity dependence of the galaxy bias from the 
SAM and the 2dFGRS CLF model. 
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Figure 8. Correlation matrix of luminosity function estimates for a volume limited sample of galaxies. Left panel: results obtained from 
the semi-analytic galaxies in the Millennium simulation. Right panel: results obtained from the theoretical model described in Eq. H41II . 



6.2 Results: Flux limited samples 

Having validated our theoretical model for the covariance 
matrix of the GLF, we now turn to the slightly more com- 
plicated case of predicting the covariance matrix for a flux 
limited survey. 

As described in [|4]we must take into account that, for 
a flux-limited survey, the observed volume depends on the 
luminosity of the objects in question and the flux limit. We 
shall take our fiducial survey to have an angular area cov- 
erage of roughly S7 S ~ 1000 deg 2 . The survey volume will 
thus be V M max = n s x max (-M/3, where x max (-M is given by 
Eq. (|52p . In order to make predictions we also need to know, 
cr 2 (L M , L v ), and in order to avoid dealing with the real com- 
plex survey geometry, we shall make the approximation that 
the cone volume can be interpreted simply as full-sky survey 
with radial dimension ii max (L M ) = [V; max ] 1/3 . Hence we em- 
ploy the window function appropriate for a spherical-top-hat 
transformed into Fourier space: 

W{k\L„) = A [smy - J/ cosy] ; y = fci? max (L M ) . (97) 

y 

Further, we shall take the flux limit to be that equivalent to 
the 2dFGRS: bj = 19.5. In evaluating Eq. (f52)l we require 
the conversion from fej-luminosity to fej-absolute magni- 
tude, and we do that using: 



the fractional errors can be written: 



M(L M ) = M Q , 6j - - log 10 



(98) 



where we have adopted Mq^j =5.3. Thus for galaxies with 
the characteristic luminosity of the 2dFGRS, we have L, — 
9.64 x W 9 h~ 2 Lq, which corresponds to M^ 7 — 51og 10 h — 
— 19.66, and the maximum distance out to which they may 
be observed corresponds to Xmax ~ 680 h~ 1 Mpc, and with 
the volume being V™ ax « 0.13 h~ 3 Gpc 3 . 



1 E FL 



(99) 



As for the case of the volume limited survey, the fractional 
errors have three contributions: the sample variance, the 
Poisson noise and the halo occupancy covariance. These 
three terms may also be described by Eqs (l94)) - ([9"6|l . 

In the figure we see that as in the case of the volume 
limited sample the fractional errors for the brightest galax- 
ies are well described by the Poisson error term. However, 
for galaxies at the characteristic luminosity of the survey, 
the errors become dominated by the sample variance term. 
The interesting change from the volume limited survey is 
that when we consider the lower luminosity bins, we see 
that whilst the sample variance term is still dominant, the 
contributions from the Poisson variance and the halo occu- 
pancy variance are also significant. This owes to the fact 
that y™"-* is significantly smaller for galaxies with L ~ L, 
than for the case of the volume limited sample. Hence this 
leads to an increase in the Poisson shot noise for these binijf]. 
Finally, we recall that owing to the fact that = E MM , 
the relative strength of the halo occupancy covariance to the 
Poisson noise is once more given by Fig. [6] 



6.2.2 Correlation matrix 

The top panel of Figure [10] presents the relative theoretical 
predictions for the correlation matrix. The top left trian- 
gle shows the contributions to the correlation matrix that 
come from the sample covariance plus Poisson noise, with 
the halo occupancy covariance set to zero. The lower right 
corner shows the same, but this time for the halo occupancy 
covariance plus Poisson noise, with the sample variance set 
to zero. 



6.2.1 Diagonal errors 

Figure [9] presents the predictions for the fractional errors on 
our fiducial survey. Considering again Eq. (|69[) . we see that 



2 Note that we are rescaling our errors to a volume of a given 
fiducial size, and for our fiducial flux-limited survey the effective 
volume is reduced for all galaxies fainter than the brightest lumi- 
nosity bin that we employ. 
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Figure 9. Fractional errors on the galaxy luminosity function for 
a survey of angular size O s ~ 1000 deg 2 , with limiting magni- 
tude bj = 19.5 as a function of galaxy luminosity. The solid blue 
line presents the total prediction of the theoretical model given 
by Eq. I|69|l . The red dashed line denotes the contribution to the 
error from the sample variance; the green dot-dashed line corre- 
sponds to the error coming from the halo occupancy covariance; 
the magenta dotted line corresponds to the error from Poisson 
noise. 



The bottom panel of Fig. [10] shows the theoretical pre- 
dictions for the total correlation matrix of GLF estimates, 
as given by Eq. (|7ip . We see that the correlation matrix is al- 
most diagonal for galaxies with L > 6L* . However for galax- 
ies with lower luminosities, the matrix becomes strongly cor- 
related. The correlations are not as strong as for the Volume 
limited survey, however r ~ 1 for luminosity bins that are 
relatively close to one another. 

Clearly, for our fiducial survey, the correlation matrix 
is dominated by the sample covariance, with a relatively 
small fraction of the off-diagonal elements coming from the 
occupancy covariance. However, for the case of a sufficiently 
large survey with y/Vj? 



J max T/'max 



a (Z/ M ,Z/„) 



0, then the off- 
diagonal elements of the correlation matrix do not vanish, 
but are given by the bottom right triangle of Fig.[TU] top 
panel. 



7 IMPACT ON PARAMETER ESTIMATION 

We now explore the importance of including the covariance 
matrix when estimating GLF parameters from observations. 



7.1 Methodology 

Suppose we have estimated the GLF from our survey us- 
ing either El or E2, depending on whether we have a flux 
or volume limited sample. We now wish to interpret these 
estimates in terms of some model. To do this, let us adopt 
the Bayesian framework. The probability of obtaining a data 
vector x, given our model M with parameters 0, is described 
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Figure 10. Correlation matrix of luminosity function estimates 
for a survey of angular size f2 s ~ 1000 deg 2 and with a flux limit 
bj = 19.5. Top panel: Relative contributions of the sample and 
halo occupancy covariance to the correlation matrix. The upper 
left triangle represents the sample variance plus Poisson noise 
contribution, with the halo occupancy set to zero. The lower right 
triangle shows the same for the halo occupancy plus Poisson noise 
covariance contribution, but this time with the sample variance 
set to zero. Bottom panel: total correlation matrix as given by 
Eq. f7I). 



by the likelihood function C(x\0,M) 
is a multivariate Gaussian: 



A good choice for C 



c(x\e,M) = 



exp 



(2n) N / 2 y/\C 

(100) 

where /i = (j,(0) and C = C(0) are the model mean and 
model data covariance matrix, both of which depend on the 
parameters 6; and |C| is the determinant of the matrix. 
Using Bayes theorem, the likelihood is directly related to 
the posterior probability distribution: 



p(0\x,M) 



U(6\M)C(-x\e,M) 
P(*\M) 



(101) 



where T1(G\A4) are a set of model priors, and p(x|.M) is 
termed the evidence, which simply can be written as a nor- 
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malisation criterion: p(x\M) = J d0Il{0\M)C(x.\0, M). The 
errors on the model parameters may be obtained through the 
exploration of the posterior distribution in the usua l way 
(jPress et al.ll 19921 ; iLewis fc Bridldl20oilHeavensll2009l ). Dif- 
ferent models Mi and M2 may t hen be compa red using 
Bayesian model selection methods (|HeavensH2009T ). 

If the priors Tl(0) are flat, then the posterior p(0|x) is 
simply proportional to the likelihood C. Close to its max- 
imum, at 9o, we may Taylor expand the logarithm of the 
posterior, and for flat priors the log likelihood, to obtain: 

lnp(0|x) oc ln£(x|6» () ) - i ^] W a(3 (0 o )A<9 Q A<9 s + .. . , 

(102) 

where in the above A0 a = (0 a — 9 a ,o) are de- 
viations of the parameters from the fiducial values, 
"Ha/3 = —d 2 \nC/d9 a d8p is the Hessian matrix, and the first 
derivative vanished at the maximum. We may rewrite the 
above expression for the posterior as, 



p(0\x) 



n(0) 

p(x) 



C(0o) exp 



(103) 

Thus "Hq/3 informs us about errors on the parameters and 
how different parameters may be correlated with respect to 
each other - in the context of their effects on the data. 

For the case of a multivariate Gaussian posterior, the 
marginalised error on parameter 8 a , is given by a aa = 
1/ ■ Since the likelihood itself depends on the data, 

it is also a random variable. Taking an ensemble average 
over many realizations of the data, we arrive at the Fisher 
matrix: 



T a fi = (Hap) = — 



d 2 \nC \ 
3B a dBp / 



(104) 



From the Fisher matrix one may obtain the expected 
marginalised error on parameter 8 a and the covariance be- 
tween parameters (8 a , dp): 



<J a p 



(105) 



For a derivation of these error bounds see iHeavensl (|2009h . 

Under the assumption that the likelihood is Gaussian in 
the data, c.f. Eq. (| lOOf) . then it ca n be shown that the F isher 
matrix takes on the special form (|Tegmark et al.lll997T l: 



a/3 



-Tr rC _1 C a C _1 C 



M/3 



(106) 



7.2 Best-fit Schechter function for SAM data 

As a concrete example of our parameter estimation proce- 
dure, we now find the best-fit Schechter function p arameters 
that describe the SAM data of lCroton et all |2006r i. We take 
the data for the volume limited sample of SAM galaxies de- 
scribed in i]5.2l Again, we divide the full simulation volume 
into 64 equal sub-cubes and estimate the GLF for each us- 
ing estimator El. We then construct the mean GLF and its 
covariance matrix, as described in £16. II We shall estimate 
the best-fit parameters for a survey region equivalent to a 
single sub-cube of size L — 125 h~ 1 Mpc. 

We adopt a Schechter function GLF model, as described 
by Eq. (|75[) . As noted earlier, this has three parameters 
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Figure 11. Comparison of different Schechter function fits to 
semi-analytic model galaxy luminosity func tion data. The open 
points with errors denote the results from the lCroton et all l|2006h 
model data. The red dot-dashed line and solid blue lines corre- 
spond to the best-fit Schechter functions obtained when fitting 
using only the diagonal elements of the data covariance matrix 
and when using the full data covariance matrix, respectively. 



= {Z/», a, </>,}. We treat L* and a as free parameters and 
fix the normalisation 0* by the constraint that we desire to 
recover the mean number density of galaxies in the volume 
that are above the luminosity cut L m i n : 



'gal 



dL$(L\0) 



(107) 



For the Schechter function this constraint is realised as: 



Wgal 



r[a + l,L min /L„] 



(108) 



where r[a;, a] is the incomplete Gamma function. We then 
construct the likelihood C as described by Eq. (|100|l . This 
function is maximised with respect to the two free parame- 
ters, and we do this using an adaptive grid search scheme. 

We find the 1- and 2-cr confidence regions of the like- 
lihood surface by identifying the contours in the £-surface 
that satisfy: 



P = £(x|6> )exp [-Ax72] 



(109) 



where £(x[0q) corresponds to the maximum of the likelihood 
and Ax 2 = {2.3, 6.17} for 1- and 2— a contours, respectively. 

The Fisher matrix approach of the previous section also 
provides us with a means for estimating the covariance ma- 
trix of parameters. From Eq. (|106p . and for a constant co- 
variance matrix, the Fisher matrix for the GLF is: 



e[L*(L M |e)]__ 1 a[L*(i„|©)] 



dd 6 



(110) 



For the Schechter function parameters, the derivatives of 
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Figure 12. 2-D Likelihood contours for the luminosity function parameters L* and a. Left panel: results obtained from a full exploration 
of the likelihood surface. The solid red ellipse and dashed red ellipse correspond to the 1- and 2-<r confidence regions, respectively. These 
are obtained when we use only diagonal elements of the data covariance matrix in the parameter estimation. The solid blue ellipse and 
hatched blue ellipse, show the same, but when the full data covariance matrix is used. Right panel: Same as the left panel, except that 
the 1- and 2-cr confidence regions are obtained using the Fisher matrix formalism, c.f. Eq. JllOjl. 



interest are: 

glog [L$(L|fl)] 
dL, 

glog [L$(L|fl)] 
da 

01og[L$(L|0)] 



L- (q + 1)L, 



= log 
1 



L_ 

17 



(in) 

(112) 
(113) 



Figure[TT]shows the best-fit Schechter function obtained 
when we fit the SAM GLF data using the full data covari- 
ance matrix (solid blue line), which correctly takes into ac- 
count the effects of bin-to-bin correlations generated by the 
large-scale structure in the volume. The figure also shows 
the best-fit Schechter function model, obtained when we use 
only the diagonal elements of the data covariance matrix for 
the parameter estimation (red dot dashed line). It can be 
clearly seen that when we only use the diagonal elements of 
the covariance matrix, the model is biased. This owes to the 
fact that the fit gives more importance to the lower luminos- 
ity bins, for which the errors are significantly smaller than 
for the brighter bins. We may see this bias more clearly by 
exploring the likelihood surface directly. 

In the left panel of Figure [T^] we show the 2-D likelihood 
surfaces for the fitted parameters L, and a, and the 1- and 
2-cr confidence limits. The left panel shows the results ob- 
tained when we employ a full exploration of the likelihood 
surface. The blue ellipses show the results obtained when 
using the full data covariance, and the red ellipses show the 
results when using the diagonal elements of the covariance 
only. This figure shows the best-fit values for {L*,a} are 
only just consistent at the 2-cr level. The best fit parame- 



ters are: 

( L, = 1.023 x 10 10 [h- 2 L Q ] 

6» fullcov ' = I a = 1.325 . (114) 

[ 4>* = 0.0122 [/i 3 Mpc- 3 ] 

( U = 8.913 x 10 9 [h- 2 L Q ] 

gdiag. cov. _ I Q =12g £ U5 j 

0.0144 [/i 3 Mpc- 3 ] 

In the right panel of Figure [T5] we show the results 
obtained when we employ the Fisher matrix formalism to 
calculate the parameter covariance matrix. Considering the 
case where we used the full data covariance matrix in the 
parameter estimation, we see that the Fisher matrix pre- 
dictions are in excellent agreement with the full likelihood 
exploration. However, for the case where we used only the 
diagonal elements of the covariance matrix in the fitting, 
we find that the Fisher matrix errors are only qualitatively 
consistent with the results for the full likelihood exploration. 



8 CONCLUSIONS 

In this paper we have investigated the galaxy luminosity 
function (GLF) and what determines its error properties for 
various commonly used estimators. 

In Sj2] we described several commonly used est imators 
for t he GLF. We showed that Turner's estimator (jTurned 
1979), which attempts to correct for the effects of large-scale 
structure on the GLF, is actually a biased estimator. 

In j}3] we then focused on the simpler estimator of 
ISchmidtJ ([l968h for volume limited samples. Using a clus- 
ter expansion approach, we showed that this estimator in 
the ensemble limit was unbiased. We derived the covariance 
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matrix of this estimator and found that it was comprised 
of three terms. The first term takes account of the sample 
variance, which depends on the biases of the galaxies in the 
different luminosity bins and also the variance of matter fluc- 
tuations in the survey volume. The second term was a simple 
Poisson noise contribution. The third term was dubbed halo 
occupancy covariance, and it arose due to the fact that sev- 
eral galaxies may be hosted by the same dark matter halo. 
We proved that the necessary requirement for sample co- 
variance to vanish is: a 2 V B — > 0. 

In g]we investigated the 1/V max estimator of (|SchmidtJ 

1968) for flux-limited surveys. We showed, in the ensemble 
limit, that this is also an unbiased estimator. We derived 
the covariance properties of the estimator. Similar to the 
case of the volume limited estimator, this matrix could also 
be decomposed into three terms: sample, Poisson and oc- 
cupancy covariance terms. For the sample variance term, 
the major difference was that one must consider the cross- 
volume variance, since two distinct luminosity bins trace two 
different sample volumes. Again the necessary condition for 
the sample variance to be subdominant was attained when 
a 2 [L M ,L M ]F M max ^0. 

In Sj5] we des cribed the semi-ana lytic model (SAM) 
galaxy catalogue of lCroton et al.l (|2006l ) that we used to test 
our theoretical model. We also sum marised the condi tional 
luminosity function (CLF) model of I Yang et ail (120031 ). We 
showed that for the volume limited estimator, both the 
SAM and CLF models were able to reproduce the 2dFGRS 
GLF. We then investigated the luminosity dependence of the 
galaxy bias, and showed that in the SAM model the cor- 
relation functions for different luminosity binned samples 
showed complicated scale-dependence. For r > 3/i _1 Mpc 
the bias was reasonably flat. We measured the large-scale 
relative bias and found that the brightest luminosity bin, 
L > 3 x 10 10 /i _2 Lq, showed a 50% larger bias relative to L, 
galaxies. For galaxies with L < L, we found that the SAM 
model predicted a much flatter luminosity dependence of the 
bias than was measured in the 2dFGRS. The CLF model, 
by fiat, reproduced the 2dF GRS data. These results were in 
agreement with earlier work (|Li et al.| [2007; Ki m et al 1 l2009l : 
iGuo et alJlioTTh . 

In |6] we used the SAM galaxies to examine the frac- 
tional errors on the GLF estimates from the volume limited 
samples. We found that for the bright galaxies the fractional 
errors were much larger than for the fainter bins. For these, 
however the fractional error became flat below L*. The er- 
rors were not reduced as the number of galaxies in the bin 
dramatically increased. These results were in excellent agree- 
ment with our predictions from the theoretical model. This 
plateau effect was explained by the sample variance being 
the dominant source of error for these luminosity bins. 

Again using the SAM galaxies we estimated the covari- 
ance matrix of the estimates of the GLF. We found that 
for the Millennium simulation volume, the cross-correlation 
coefficient was r < 0.5 only for galaxies with L > 5 x 
W 10 h~ 2 Lq. For lower luminosity galaxies r > 0.5 and at 
the faint end the matrix was almost perfectly correlated. We 
showed that the theoretical predictions from our theoretical 
model was again in excellent agreement. 

We then used our theoretical model to make predictions 
for how the errors would change for a GLF estimated from 
a flux limited survey. For the fractional errors, the main 



differences from the volume limited sample, were that for 
the low luminosity bins the errors increased with decreasing 
luminosity. This owed to the reduced surveyed volume for 
these bins. However, the sample variance was still dominant 
on these scales. Exploring the covariance matrix, we found 
that in this case the matrix was less correlated than for 
the volume limited sample. However, the matrix was still 
highly correlated for the galaxies with L < L*. Again the off- 
diagonal covariance was attributed to the sample variance 
term. 

In [J7]we explored the importance of including the full 
data covariance matrix when interpreting observations in 
terms of a given model. We showed that if one neglects 
the bin-to-bin covariances in the luminosity function, then 
parameter estimates will be biased. When fitting Schechter 
functions to data, we found that the most seriously affected 
was the characteristic luminosity, which was systematically 
under-estimated by 10-20%. 
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APPENDIX A: CLUSTER COUNT STATISTICS 

In this appendix we calculat e {Ng} Ps and (NgNp) p g . 
These derivations follow from iHu fc Kravtsovl (|2003l l and 
ISmith fe Marianl l^OllT) . 

Consider some large cubical patch of the Universe, of 
volume V a , and containing N clusters that possess some dis- 
tribution of masses. Let us subdivide the cluster population 
into a set of N m mass bins. Let the number of clusters in 
the Q th mass bin be denoted N^. We shall assume that the 
probability that the volume contains clusters in the mass 
bin a, is a Poisson process: 



P(K\m 



m a exp(—m a ) 



(Al) 



For any quantity X that depends on the number of clusters, 
we denote the average over the sampling distribution-the 
Poisson process in this case-as (X) p . Thus, the average of 
Na over the sampling distribution can be written: 

(K) p = m a = m a [l + b a 6 v (x)] , (A2) 

where V is the expected number of counts aver- 

aged over the Poisson sampling distribution and the density 
fluctuations cV(x) in the volume. The volume of the survey 
and the volume-averaged overdensity field, are written: 



Vs = 



j dV VU(x'|U s ) 



Sv(x) 



Vs 



d 3 x'H/(x'|U a )(5(> 



(A3) 
(A4) 



where VF(x|I4) is the window function for the survey and 
n a and b a are given by Eq s ([TBI and ((28)) . 

Following iLima fc Hu! (2004) we take the likelihood of 
drawing a particular set of cluster counts in the mass bins 



to be N G {AT 
£(N|m, S) = 



, N% } in the cells to be: 



a m 



G(m|m,S) (A5) 



where m £ {mi, . . . , mjv Q } is a model for the counts in the 
cells, N = N m and where it was assumed that the statis- 
tics of the volume-averaged density field are described by a 
multivariate Gaussian: 



G(m|m,S) 



\S\^ 2 



■ exp 



— -(m — m) T S 1 (m — m) 



(A6) 

where S is defined to be 

S a /3 = {(m a - m a ) (mp - m p )) s (A7) 

Note, we refer to averages over the density field as sample 
averages and for a quantity X, they will be denoted {X) g . 

At this point we may be more precise about what we 
mean by ensemble and Poisson averages: 



E £ £(N|m,S)X(N) 



(A8) 
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Equation (IA5|) can be simplified in two limits: If S a <§C rn a , 
then the likelihood is the product of Poisson processes; alter- 
natively, in the limit of a large number of counts in each cell, 
then the Poisson process becomes close to Gaussian and the 
likelihood can be approximated as a Gaussian with shifted 
mean and augmented covariance matrix: 



£(N|m,S) » G(N|m,C) 



M + S 



(A9) 



where M — > M a p — 5^^rn a . Note that in the above equa- 
tion, the approximate sign is used since negative number 
counts are for mally forbidden (f or a more detailed discus- 
sion of this see lHu fc Cohnll2006h . 

The covariance of the counts can be written 

oo oo 

(N c a N c B ) sp = £ ... £ C(N\m,S)N c a Nl 

m=o N c Nm =o 
d"mG(m\m, S) 

x \m a m p i aP + ((A^) 2 ) ^g] ■ (A10) 

where e a p = 1 when a 7^ /3 and otherwise. Considering 
the second term on the right-hand-side of the above equa- 
tion, and recall that for the Poisson distribution we have: 
(X 2 ) = (X) [1 + (X)]. Hence, on using this fact, and cou- 
pled with m a = n(M a )AM a V we find: 

(KN%) s P = J d A W?(in|m,S) [m a mp + m,i^] 

= [s a p + m a rfip + m a 5a,p^ • (All) 



APPENDIX B: INCORPORATING 
MAGNITUDE ERRORS 

The above analysis has so far included errors induced in the 
GLF that arise from large-scale structures and also the occu- 
pancy of galaxies in haloes. We now examine how the above 
results are modified in the presence of calibration errors in 
the magnitudes of the galaxies. Again, we shall look to the 
results from the 2dFGRS for illustration. 

We take account of the mapping between the true lu- 
minosity L and the observed L° in the following way: the 
observed GLF can be written 



4>{Ll) = 



dL° 



dLp(L°\L)(j}(L) 



(Bl) 



where galaxies are observed with luminosities in the bin 
< L° ^ ip+i- In the above the key new ingredient is 
the probability distribution for obtaining a luminosity L° 
iven the underlying true luminosity L. In iNorberg et al.l 
2002T ) . the observed fcj-band magnitudes, m°, of the 2dF- 
GRS galaxies were found to have a calibration error that 
was well described by a Gaussian with width er m = 0.15, 
with underlying true mean magnitude m. Hence, 



p(L°\L) 



PG{m°\m) 



dm° 



- exp 

27TCT M 

2L° log e 10 



dL° 
(m-m ) 2 " 1 



dm° 



5y 27TO",, 



exp 



dL° t 
25 (log 10 L/L°f 



Sal 



(B2) 



where in the above equations we used the relation L/L° — 
^Q-2/5(m-m )^ ^ Q conl p U t e the Jacobean of the coordinate 
transformation: \dL°/dm°\ = -2L° log e 10/5. 

Thus, with magnitude error uncertainties included, the 
covariance matrix becomes, 



L°+l 



dLI 



dLS 



dLip(L?|Li) / dL 2 p(L 2 \L 2 )C[L 1 ,L2] 



(B3) 



On inserting Eq. Q36p for the true covariance, the observed 
covariance can be written 



C[Lfj,, L v ] 
1 



V S AL° 

+ — J dMin(Mi)$(Ll\Mi)0(L o v \Mi) . 
where we have defined three new terms: 



(B4) 



4>{Ll\M) 



f L l+l 


dLI 




AL„ 


r L °'+i 


dLI 








dLI 




AL„ 



dLip(Ll\Li)(fi(Li) ■ (B5) 
dLip(L;|LiMLi)6*(Li)(£6) 

o 

dLip(!Z\Li)<l>(Li\M) . (B7) 



In the limit where the luminosity bins are sufficiently narrow 
that the integrand does not vary across the bin, then the first 
integral in the above equations may be approximated by the 
central value of the integrand, in accordance with the mean 
value theorem. Furthermore, since we take the error in the 
magnitude distribution to be a Gaussian of width o m , the 
limits of the second integral can be restricted to be L m ax(£°) 
and L m i n (L°). Hence 



4>{L° t \M) 



»x(i°) 

ax(0 

ax(i°) 
,(1°) 



dL 1 p(L° 1 \L 1 )4>(L 1 ) ; (B8) 



dLip(L;|Li)0(Li)& 9 (Li) ; (B9) 



dL lP (L° 1 \L 1 )<l>(L 1 \M) . (BIO) 



In practice, the upper and lower bounds on the integrals 
are computed by allowing the minimum 'true' magnitude, 
which contributes to an observed magnitude bin, to be 4<r 
away from the mean, respectively. This gives, 



Lni&x/L — 

Ljnin/L 



10 -2/5(m" 



-m°) 
— m° ) 



= 10 8/5<r m 
_ 10 -8/5<T„ 



(Bll) 
(B12) 



On adopting the appropriate value for the 2dFGRS, a m — 
0.15, this leads us to adopt the integral limits L max = 1.74L° 
and £ min = 0.575L . 
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